MEDIANS AND MEANS IN FINSLER GEOMETRY 



MARC ARNAUDON AND FRANK NIELSEN 



Abstract. We investigate existence and uniqueness of p-means e p and the 
median e\ of a probability measure fi on a Finsler manifold, in relation with 
the convexity of the support of fi. We prove that e p is the limit point of 
a continuous time gradient flow. Under some additional condition which is 
always satisfied for p > 2, a discretization of this path converges to e p . This 
provides an algorithm for determining those Finsler center points. 



The geometric barycenter of a set of points is the point which minimizes the 
sum of the squared distances to these points. It is the most traditional estimator is 
statistics that is however sensitive to outliers [16] . Thus it is natural to replace the 
average distance squaring (power 2) by taking the power of p for some p € [1,2). 
This leads to the definition of p-means. When p = 1, the minimizer is the median of 
the set of points, very often used in robust statistics |16) . In many applications, p- 
means with some p € (1,2) give the best compromise. For existence and uniqueness 
in Riemannian manifolds under convexity conditions on the support of the measure, 
see Afsari pQ. 

The Fermat- Weber problem concerns finding the median e\ of a set of points in 
an Euclidean space. Numerous authors worked out algorithms for computing e%. 
The first algorithm was proposed by Weiszfeld in [32] (see also [31]). It has been 
extended to sufficiently small domains in Riemannian manifolds with nonnegative 
curvature by Fletcher and al. in [13] . A complete generalization to manifolds 
with positive or negative curvature (under some convexity conditions in positive 
curvature) , has been recently given by Yang in [34) . 

The Riemannian barycenter or Karcher mean of a set of points in a manifold 
or more generally of a probability measure has been extensively studied, see e.g. 
[IT] . [18] . [19] . [II], [28], [4], [9], where questions of existence, uniqueness, stability, 
relation with martingales in manifolds, behavior when measures are pushed by 
stochastic flows have been considered. The Riemannian barycenter corresponds 
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to p = 2 in the above description. Computation of Ricmannian barycenters by 
gradient descent has been performed by Le in [21] . 

The aim of this paper is to extend to the context of Finsler manifolds the results 
on existence and uniqueness of p-means of probability measures, as well as algo- 
rithms for computing them. Some convexity is needed, and as we shall see the fact 
that comparison results for triangles as Alexandroff and Toponogov theorems do 
not exist impose more restrictions on the support of the probability measure. As a 
consequence, the sharp results on existence and uniqueness established by Afsari [T] 
and the algorithm for computing means of Yang in [33] do not extend to Finsler 
manifolds. 

The motivation for this work primarily comes from signal filtering and denoising 
in the context of Diffusion Tensor Imaging (DTI), High Angular Resolution Imag- 
ing (HARDI, see [30], [27], [5]), Orientation Distribution Function (ODF), active 
contours |23) . Applications with experimental results of an implementation will be 
reported in forthcoming papers. 

Information geometry at its heart considers the differential geometry nature of 
probability distributions induced by a divergence function. In probability theory, 
invariance by monotonic re-parameterization and sufficient statistics yields the class 
of /-divergences [2] If(p,q) = J p(x)f(^&)dx that includes the Kullback-Leibler 

(KL) information-theoretic divergence KL(p, q) = J p(x) log ^ydx as its prominent 
member (for f(t) = — logt). It is well-known that the KL divergence (better known 
as the relative entropy) yields a dually flat structure [2] generalizing the (self-dual) 
Euclidean space. 

Because divergences are usually asymmetric and violate the triangle inequal- 
ity they have not been extensively considered from an algorithmic point of view. 
Indeed, the triangle inequality property is often used in computational geometry 
to design efficient algorithms by allowing various "pruning" techniques [101 122) . 
Computational geometry has thus mostly considered metric spaces for keeping the 
triangle inequality properties. 

One can metrize divergences. The KL divergence can be symmetrized either into 
the Jeffreys divergence J(p,q) = KL(p,q) +KL(q,p) or the Jensen-Shannon (JS) 
divergence: 



The latter is preferred in practice because it is bounded, and its square root yields 
a metric that can be embedded into a Hilbert space [Hj . 

Finsler distances, arising from the underlying the Finsler metrics, are attractive 
as they preserve the triangle inequality |30| for efficient algorithmics but potentially 
model asymmetric distances. 

In information geometry, the regular divergence D associated to a Finslerian 
metric distance d can be defined as D(p, q) = d 2 (p, q). Observe that the Finslerian- 
based divergence looses then the triangle inequality property [30]. (eg., the squared 
Euclidean distance does not satisfy the triangle inequality). 



JS(p,g)=KL(p, 



p + q 

2 



) + KL(g, 



p + q 
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2. Preliminaries 

Let M be a smooth manifold. On M we consider a Finsler structure F : TM —> 
R+. For any x € M, V, X,Y,Z G T X M such that V" ^ 0, let 

1 <9 2 

(2.1) ffv (x,y):=-— — F 2 (y + s x + <y). 

v ; 2 9s9t («,t)=(0,0) 
(we shall also use the notation <X, Y>y = gy(X,Y)) and 

1 <9 3 

(2.2) <X,y,Z>y — F 2 (V + rX + sF + tZ). 

4 OrOSOt (r,s,t)=(0,0,0) 

We have 

(2-3) <X, y, Z> v = ~ <?y +rX (Y, Z) 

2 or r=o 

and in particular since -F 2 is 2- homogeneous and V i— > gv(X, Y) is 0- homogeneous, 

(2.4) <V,Y,Z> V = Q. 

Let V be a non- vanishing vector field on M. The Chern connection V v is torsionfree 
and almost metric, and can be characterized by 

(2.5) X<Y, Z> v = <W X Y, Z> v + <Y, V V X Z> V + 2<W X V, Y, Z> v . 
More precisely, parameterizing locally TM by coordinates 

(x 1 ,...,x m ,y 1 = dx 1 ,...,y m = dx m ), 
defining the geodesic coefficients as 



y G TM\{0}, 



V-V) 1 ij - 775 I — + — ~ — 



letting 

i fir) r) 

then the Christoffel symbols of the Chern connection are given by 

1 kl ( S 9ij Sgu &9i: 
2 y \Sx l 5xi Sx l 

(see [5]). Note that defining 

(2.9) 8y l = dif + Ni(y)dx j 
we have for a smooth function / : TM\{0} — » K 

(2.10) df = + !4<yy*. 

ox 1 ay' 

The Chern curvature tensor is defined by the equation 

(2.11) i? y (X,F)Z := V X VyZ- VyVx Z - V{x,Y] Z > 

and the flag curvature is 

(2.12) X { V, W) := <^^W> 

for two non collinear V,W £ T X M. 

We say that M has nonpositive flag curvature if for all V, W, Jf(V, W) < 0. 
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(2.13) 3T V {W) = <VZW -V^W,V>^ 



The tangent curvature of two vectors V, W G T X M is denned as 

'V 

where W is a vector field satisfying W x = W . For a nonnegative constant 5 > we 
say that & > —5 or 3? < S if respectively 

(2.14) ST V {W) > -5F{V)F{Wf or 3T V {W) < 5F{V)F{W) 2 . 

For x G M we define 



,„ K1 \ <v,v> v <v,v> w 

(2.15) C €[x) = sup w , 20 (x) = sup w . 

v,weT x M\{0} V u.iueTajMXfO} V <v,v> v 

Remark 2.1. For applications in active contours, a "Wulff shape" is given which 
does not depend on x and defines the Finsler structure. From this shape & and ^ 
can easily be calculated. See e.g [53] and [33] . 

A geodesic in M is a curve i H> c(i) satisfying for all t, V^|*|c = 0. It is well 
known that a geodesic has constant speed, and that it locally minimizes the distance 
([8]). If so, letting p(x,y) the forward distance from x to y, then 

(2.16) p 2 (x,y) = <c(0) 1 c(0)> m 

where t H> c(t) is the minimal geodesic satisfying c(0) = x and c(l) = y. By 
definition, the backward distance from x to y is p(y,x). 

For a; 6 M and v G T X M, we let whenever it exists exp 2 .(v) := c(l) where 
t h-> c(t) is the geodesic satisfying c(0) = u. 

If M is complete, analytic, simply connected and has nonpositive flag curvature 
(we say that M is an analytic Cartan-Hadamard manifold), then exp^. : T X M — > M 
is an homeomorphism ([6] theorem 4.7). Under these assumption, letting for x,y G 
M, xtj = exp~ 1 (y), we have 

(2.17) p 2 (x,y) = <xy',xy> x ^. 

For xq G M and R > 0,, let us denote by B(xq, R) (resp. B(xo, R)) the (forward) 
open (resp. closed) ball with center xq and radius R: 
(2.18) 

B{x ,R) = {ye M, p(x ,y) < R} (resp. B(x , R) = {y G M, p(x , y) < R}) 

Now let (t,s) n- c(t,s) a family of minimizing geodesies t i— > c(t,s), t G [0,1], 
parametrized by s G /, / an interval in R. Define 

(2.19) £( S ) = ip 2 (c(0, S ),c(l, S )). 

The computation of -E'(s) and E"(s) is well-known, see e.g. [TJ. We have 

(2.20) £'(s) = <a.c(l, s), 9 t c(l, s)> dtc(hs) - <d s c(0, s), d t c(0, s)> atc(0 , s) . 

As for the second derivative, letting c = c(-, 0), and for X, Y vector fields along c 

(2.21) I(X, Y)= f (<V?X, V?y> T - <R T {X, T)T, Y> T ) dt 

Jo 

the index of X and Y, we have 
(2.22) 

e"(o) = <vJ c c ( ( i: ) ) 9 sC (i,-),a tC (i,o)> atC(1:0) - <vJ c j ( ) ) a sC (o,.),a tC (o,o)> atc(0:n) 

+ I(d s c(;0),d sC (;0)). 
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Assuming s H> c(0, s) and s H> c(l, s) are geodesies, we obtain 

(2.23) E"(0) = ^ atc( o ) o)(5 s c(0,0))-^a t c(i,o)(9 s c(l,0)) + J(9 s c(-,0),9 s cO,0)). 

We are interested in the situation where c(l, s) = z a constant. In this case we 
have 

(2.24) E"(0) = ^ tc(0> o)(9 s c(0 ) 0))+J(a s c(-,0),9 s c(-,0)). 
For p > 1, define 

(2.25) D p (a)=p"(c(0,»),z) 

Proposition 2.2. Assume J(f < k, & > -S, c £ < C for some k,S > 0, C > 1. 
Let p > 1. TTien writing r — p(x, z), 

(2.26) ^'(0) > pr^ (rain L - 1, ^gffi ) <T a - fr) . 
// 2 and a; = c(0, 0) satisfy p(x, z) < R(p, k, (5, C) with 

(2.27) R(p, k, 6, C) = min (^r, -j= arctan (^j ^ 

and the injectivity radius at x is strictly larger than R(p, k, 5, C), then -Dp (0) > 0. 
Remark 2.3. Note if p > 2 then 

R{p, k, 6, C) = R(2, k, 6,G) = -j= arctan ■ 

Proof. Dehne T(t) = d t c(t,0), J{t) = d s c(t,0), 

j T (t) = ^ <m,T(t)> nt) T(t), J N (t) = j(t) - j T ( t ). 

Using successively [7J Lemma 9.5.1 which compares the index I(J, J) with the one 
of its "transplant" into a manifold with constant curvature k 2 and the index lemma 
[7J Lemma 7.3.2 which compares the index of the transplant to the one of the Jacobi 
field with same boundary values, we get, letting r — p(x, z) = Z?i(0), 

(2.28) I(J, J) > ^os(Vkr) <jN{Qh jN {q)>t{o) + <jT(0)i jT {0)> 

sm(v kr) 

Using the expression (|2.24[) for £"'(0) we obtain 

(2.29) E"(0) > -Sr + ^L^M. < ^ (0 ), J»> T(0) + < J T (0), J T (0)> T(0) . 

sm(v kr ) 

We have from ([2^20]) 

(2.30) E'(0) 2 = r 2 <J T (0), J T (0)> T(0) . 



Now from Di(s) = y / 2E(s) we get 

(2 - 31) D ^ s) = Dwr ^^w^^w^' 
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and this yields 

£>p(o) 

= P D 1 (oy- 2 ((p-i)i?i(o) 2 + Ui(o)u?(o)) 

= prP- 2 ((p - 2)< J T (0), J T (0)> T(0) + £"(0)) 

> ^~ 2 f(p - 1)< J T (0), J T (0)> T{0) -Sr+ ^os(Vkr) <jN{ol jN {Q) \ 

> P r^ (nun - 1, < AO), J(0)>. ( o) - *r) 

> ^~ 2 fmin (p - 1, V " rC ° S ^ r) ^| G~ 2 - ^ . 

V V sin(Vfcr) y J 

^From this bound the rest of the proof follows easily. □ 

Similarly, we can obtain an upper bound for Dp(0): 

Proposition 2.4. Assume the sectional curvatures have a lower bound —0 1 
for some (3 > 0, and ST < 6' for some 8' > Q, S> < D for some D > 1. Again let 
r = p[x,z), assume that the injectivity radius at x is larger than r. Then 

(2.32) D£(0) < pr p - 2 (D 2 max(p- l,/?rcoth(/3r)) +S'r) . 
Proof. We have by (|2~!M|) and (|2~3T|) together with the fact that 

(2.33) I(J, J) = < J T (0), J T (0)> + I(J N , J N ), 

D;(0) = prP- 2 (p - 1)<J T (0), J T (0)> T(0) + I(J N , J N ) + ,9t{J)) ■ 

Let 1 1-> X(t) the parallel vector field along t H> c(i, 0) with initial condition J N (0), 
and for t G [0, 1], let 

G(i) = cosh(r/3i) - coth (r/3) sinh(r/3i). 

This is the solution of G" = r/3G with conditions G(0) = 1 and G(l) = 0. The 
vector field t i-» F(i) along t 1— > c(i, 0) defined by 

(2.34) Y{t) = G(t)X(t) 

has same boundary values as t i-> J N (t), so by the index lemma [7] Lemma 7.3.2 
we have 

(2.35) I(J N ,J N )<I(Y,Y). 
On the other hand 

I(Y,Y) 

1 (G'(<) 2 <J w (0), J»> T(0) -G(i) 2 <i? T (X(<),T(<))T(i),X(t)> T(t) ) dt 

< <J Ar (0), J JV (0)> T( o) / (G'(i) 2 +r 2 /3 2 G(i) 2 ) eft 
Jo 

= <J N (0), J n (0)>t(o) {[G'{t)G{t)]l + / G(t) {-G"{tf + r 2 /3 2 G(t)) dt 



<J A, (0),J Ar (0)> T(0) r/3coth(r/3). 
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So 

(2.36) 
^(0) 

< prP~ 2 ((p - 1)< J T (0), J T (0)> T(0) + r/3 coth(r/3)< j"(0), J Ar (0)> T(0) ) + <5'r 
<pr p ~ 2 (max((p- 1), r/?coth(r/3)) < J(0), J(0)> T(0) + <5V) 
<pr p ~ 2 (Z> 2 max((p- 1), r^coth(r/3)) +S'r) 
since F(J(0)) = 1. 

□ 

For x G M, let £ x : T X M — > T*M be the Legendre transformation, denned as 

(2.37) e x (V)=g v (V,-) if V ? 0, 4(0) = 0. 

It is well-known that 4 is a bijection. The global Legendre transformation on TM 
is defined as 

(2.38) J?(V) = t n(v) (V) 

where ir : TM — > M is the canonical projection. If we define the dual Minkowski 
norm F* on T*M as 

(2.39) F*(0 = nua{€(y), 2/ G T X M, = 1}, 
then 

(2.40) F = F* o Jzf 
and for non zero V G TM and a G T*M, 

(2.41) (JSf(V0,V) =T(F) 2 , (a^V)) =T» 2 
(see e.g. [3]). 

For / a C 1 function on M we may define the gradient of / 

(2.42) grad/^^- 1 ^/). 

3. Forward p-means 

Let fi be a compactly supported probability measure in M. For p > 1 and x € M 
we define 

(3.1) # ft ,(a;) = / p p {x,z) p(dz). 

JM 

The (forward) p-mean of p is the point e p of M where S^^ v reaches its minimum 
whenever it exists and is unique. 

In this paper we will consider forward p-means and we will call them p-means. 
Similarly we could define the backward p-mean V p as the point which minimizes 

x i-> & n, v {x) := / p p (z,x) fi(dz). 

JM 

Depending on the context, forward or backward mean is more appropriate. One 
should note that defining the reverse (or adjoint) Finsler structure F(v) = F(—v), 
v G TM, it is easy to check that the associated distance p satisfies p(z,x) = 
p{x, z), and forward p-mean for F is backward p-mean for F. So without loss of 
generality we can consider only the forward p-means. 



s 



M. ARNAUDON AND F. NIELSEN 



One should also note that in High Angular Resolution Imaging the Finsler struc- 
ture is symmetric, so both notions coincide. It is not the case for the application 
concerning active contours where it is natural to consider non symmetric F. 

Even if it is in a non-Finslerian context, one can give the example of right- 
sided and left-sided Kullback-Leibler divergences for families of Gaussian probabil- 
ity densities (see [25] )• The left sided centroid focuses on the highest mode (it is 
zero- forcing) , and the right-sided centroid tries to cover the support of both normals 
(it is zero-avoiding as depicted in Fig. 2 of |25j). 

Proposition 3.1. Assume there exists C > such that ^(x) < C for all x £ M , 
where ^{x) is defined in (|2.15l) . Assume furthermore that supp(/x) C B{xq,R) for 
some xq £ M and R > 0. Then x h- > S^^ix) has at least one global minimum in 
B(x ,C(l + C)R). 

Proof. We begin with establishing that for all yi,y 2 G M, 

(3-2) -^p{V2,yi) < p(yi,V2) < Cp(y 2 ,yi). 

It is sufficient to establish the second inequality and then to exchange y\ and y 2 . 
If 1 1-> tp(t) is a path from yi = tp(0) and y 2 — (p(l) then its length L(ip) satisfies 

L{<p) = yj<(f>(t),(p(t)> m dt 

f 1 I <-<p<t),-<p(t)>v(t) r — 772 — 7^7 

= / \ rr- J<-(f(t ),-<p (t)>_ m dt 

Jo V < ~ f{t)^-V>\i)>-<p{t) v 

< V(<p(t))yJ<-<p(t),-<p(t)>_ m dt 

<C J ^<-<p{t),-ip{t)>_ m dt 
= CL(0) 

where tp is the path from y 2 to y\ defined by (p(t) = ip(l — t). Minimizing over all 
paths ip from yi to y\ we get 

(3-3) p(yi,V2) < Cp(y 2 ,yi). 

Now if supp(/x) C B(xq,R) then ^j, p (xo) < R p ■ On the other hand, if x ^ 
B(x , C(l + C)R) then for all y G B(x ,R) 

p(x,y) > p(x,x ) - p(y,x ) 

> ^;P{xo,x) - Cp(x ,y) 

> (1 + C)R — CR = R 

and this clearly implies that &n,p(x) > R p - From this we get the conclusion. 

□ 

Concerning the uniqueness of the global minimum of <^ >p , we also have the 
following easy result. 

Proposition 3.2. Assume that p is supported by a compact ball B(xo, R), and that 
for all z G B(xo,R), the function x <-> p p (x,z) is strictly convex in B{xq,C{1 + 
C)R). Then p has a unique p-mean in B(xq, C(l + C)R). 
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Proof. If x <— > p p {x,z) is strictly convex for all z in the support of p then S^p is 
strictly convex, and this implies that it has a unique minimum, which is attained 
at a unique point e p . □ 

Corollary 3.3. Assume < k, ST > -S, & < C for some k,5 > 0, C > 1. Let 
p > 1. Again let 

R(p, k, S, C) = min ( ~7f arctan I TWl 




If p is supported by a geodesic ball B(xq,R) with 
(3-4) R< ^ R{p,k,S,C) 

and the injectivity radius at any x € B(xo,C(l + C)R) is strictly larger than 
R(p, k, 6, C) then p has a unique p-mean e p satisfying 

(3.5) epe B^x Q ,-^-jR(p,k,6,C) 

Proof. If x, z £ B(x , C(l + C)R) then 

P {x, z) < p(x, x ) + p{x 0l z) < (1 + CfCR < R{p, k, 6, C). 

Using proposition ^. 2[ we obtain that <ou,p is strictly convex on B(xq, C(l + C)R). 
So by proposition 13.21 ji has a unique p-mean in B(xq, (7(1 + C)R). □ 

Remark 3.4. Letting xo € M, D be a relatively compact neighborhood of Xo, 
then and ^ are bounded above on D by, say kr> and Cr>, and S? is bounded 
below on D by —5d- Using these bounds instead of k, C and S, we can find R 
sufficiently small so that the conditions of corollary 13.31 are fulfilled. So we can say 
any measure p with sufficiently small support has a unique p-mean. 

Remark 3.5. If M is a Cartan-Hadamard manifold, we recover the fact that we 
can take R(p, k, (5, C) as large as we want. 

More generally, in the Riemannian case, Afsari [T] proved existence and unique- 
ness of p- means, p > 1 on geodesic balls with radius r < — min -|inj(A/), — \ ^ 

p e [1, 2), and r < - min < inj(Af), 1 if p > 2. Even taking 6 = and C = 1 in 
2 L y/k J 

Corollary [231 the support of p has half the size of the one in Q] for p <G (1, 2) due to 
the fact that we have an additional condition Q3.4[) coming from the non optimality 
of Proposition 13.21 in the Riemannian context. As for p > 2 another factor two is 
gained in [T] with repeated use of Toponogov and Alexandroff theorems which are 
not available in our context. 

Remark 3.6. The condition on injectivity radius is the same as in the Riemannian 
case. The cut locus of any point of x £ B(xq, C(1+C)R) has to be at distance larger 
than R{p, k, 5, C). As for Riemannian manifold there is no general condition which 
insures this property for cut points, but for conjugate points the same condition 
holds, due to Rauch comparison theorem, see Theorem 9.6.1 in [7]. In the particular 
case when M is a Cartan-Hadamard Finsler manifold, i.e. it has nonpositive flag 
curvature and it is simply connected, then the injectivity radius in everywhere 
infinite (see Theorem 9.4.1 in [7]). 
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Proposition 3.7. Let a t— > x{a) solve the equation 

(3.6) x(0) = x andfora>0 x'{a) = grad^^-^,^). 

Under the conditions of Corollary \3.3l the path a i— > x(a) converges as a — > oo to 
the p-mean of ji. 

Proof. If f(a) = (—S' f _ t )(x(a)) we have as soon as grad x / a )(— <^,p) ^ 0, 

/'(a) = (da;(a)(-<^t,p),^(ffl)) 

= (^(a)(-^ M ,p) I g rad a; (a)(-^p)) 
= (d x (a) ( — <^/*,p)) (4r(a) ( _< ^,p))) 

= F* (d, (a) (-^ p )) 2 

by (I^DD and JUT]). 

On the other hand, we have /(0) > — R p and / is nondecreasing. This implies 
that for all a > 0, x(a) € B(xo,C(l + C)ii), since for all x £ B(x ,C(l + C)i?), 
<^^.,p{x) > R p . As a consequence x(a) has limit points as a goes to infinity, and 
since f(a) converges, any limit point is a critical point of x H ► #^p(x). But by 
Proposition 13.21 has a unique critical point in B(xq,C(1 + C)R) which is the 
mean e p of fj,. So we can conclude that x(a) converges to e p . □ 



4. Forward median 
Let fj, be a compactly supported probability measure in M. For x E M we define 

(4.1) jyx) = / p(x,z)[i(dz). 

J M 

The median of is the point in M where J^, reaches its minimum whenever it 
exists and is unique. 

Again we have the following result. 

Proposition 4.1. Assume there exists C > such that ^(x) < C for all x 6 M. 

Assume furthermore that supp(/i) C B(xq, R) for some xq 6 M and R > 0. Then 
x i — ^ ^u(x) feasi one global minimum in B(xq, C(l + C)R). 

Proposition 4.2. Assume that /i is supported by a compact ball B(xq,R), that the 
support of fi is not contained in a single geodesic and that for all z € B(xq, R), the 
forward distance to z is convex, and strictly convex in any geodesic of B(xq, C(l + 
C)R) which does not contain z. Then fj, has a unique median m € B(xq, C(1+C)R). 

Proof. Clearly under these assumptions is strictly convex, so it has a unique 
local minimum, this minimum is global and is attained at a unique point m € 
B(x ,C(l + C)R). □ 

Remark 4.3. Contrarily to the case of p- means for p > 1, we cannot say at this 
stage that any probability measure fi with sufficiently small support has a unique 
median, since we don't know whether is strictly convex or not. In the next 
proposition we give a sufficient condition for strict convexity of 
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Proposition 4.4. Assume Jtf < k and 2? > —6 for some k, 5 > 0. Assume that 
the injectivity radius at any point of B(xq, C(1 + C)R) is larger than (C 2 + C + 1)R. 
Define 

rj = min 

(4.2) 



' Vfccot (Vkp(TT(v), z)J <v N , v N >^ z n{dz) 



v G TM 



satisfying n(v) € B(x ,C(l + C)R), F{v) = l| 



where v is the normal part of v with respect to the vector n(v)z and the scalar 
product <•, '> ^7rj . If r\ — S > then is strictly convex on B{x$, C(l + C)R). 

More precisely, for all x £ B(xq,C(1 + C)R) and for all unit speed geodesic 7 
starting at x, 

(4.3) (J^ o 7 )"(0) > r) - 6. 
Proof. With the notations of section [51 from (I2.31[) we have 

(4.4) D'{(0) = r- 1 (E"(0) - <J T (0), J T (0)> T(0) ) . 

Let 7(s) = c(0,s), the unit speed geodesic with initial condition v = J(0), and 
f(s) = =^ M (7(s)). Equation JO} together with gives 

(4.5) f"(0)>S+ [ \^cot(ykp(n(v),z)^<v N ,v N >^ zt i(dz)>r]-5. 



From this we get the condition for the strict convexity of & 



□ 



For x € M define the measure fi x = fi — fi({x})5 x . Then the map y i-» ^ Alx (y) 
is differentiable at y = x. 
Since 

(4-6) Jyy) = (y) + M({4)p(y, as) 

and for v G T X M , is differentiable in the direction v with derivative 

(4.7) (d^, v) = (dJ?^ , v) + fi({x})F(-v), 

we see that a; is a local minimum of if and only if for all nonzero v € T X M 

(4.8) n({x})F(-v)>(d^ x ,-v) 
which is equivalent to 

(take -v - F fL i^'l l- But since F* = F o S£~ x we get 

Proposition 4.5. A point x in M is a local minimum of if and only if 

(4.10) ^({x})>F*(d^J. 

Note that for the Riemannian case this result is due to Le Yang |34) . 
Define the vector 

(4.11) ff(z) = gracL (y)) | y =,. 
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Alternatively, 

(4.12) H(x)=&- 1 I [ JS?( T^xi] n(dz)) . 

\Jm\{x} V PK x ,z) J J 

Let a i — y x(a) be the path in M defined by x(0) = xq and 

H(x(a)) if for all a' < a, fx({x{a')}) < F* (d&^A ; 
if for some a' < a, fj,({x(a')}) > F* (d^ x ^ . 

f(a)=^(x(a)). 

We have for the right derivative of / when x(a) is not a minimal point of J^,: 

f+(a) = (i( a )(\ w ),i(a))+/i(W«)})f(-i(«)) 
= - (d x{a) (^ (a) ) , %-\d x(a) (o) ))) 

+ »({x(a)})F (JSf-^dxW^w))) 
= —F* (4(a) (^ (o) )) 2 +M({x(a)})^(^- 1 (4(a)(^ (o) ))) • 

We get 

(4.i5) /;(o) = -f* (4(a) (** Kw^wD-cW)) 

which is negative as soon as a; (a) is not a minimal point of From this we get 
the following 

Proposition 4.6. Assume that [i is supported by a compact ball B(xq,R), that the 
support of fi is not contained in a single geodesic and that for all z € B(xq,R), the 
forward distance to z is convex, and strictly convex in any geodesic of B(xq, C(l + 
C)R) which does not contain z. Then the path a > x(a) converges to the median m 

of [l. 

Proof. Similar to the proof of proposition 13.71 □ 

5. An algorithm for computing p-means 

Lemma 5.1. Assume X > -f3 2 , ST < 6' <3 < D with f3 > 0, 5' > D > 1. For 

p > 1, r > 0, define 

(5.1) H{r) = H Pt/3iDiS ,(r) := pr p ~ 2 (D 2 max {{p - 1), r/3 coth(r/3)) + S'r) . 
If n is a probability measure on M with bounded support and x € M, define 

(5-2) Hfiix) = Hn#£ >D> s>(x) ■= H p> p tD ,6'(p{x,y))dfi. 

J M 

If ~b i — V 7(t) is a unit speed geodesic then for all t 
(5-3) (^o 7 ) # (t)<ff (I ( 7 (i)). 

Proof. For x,y € M, r = p{x,y), s M- j(s) — c(0, s) a unit speed geodesic started 
at x = c(0, 0), 1 1-> c(t, s) the geodesic satisfying c(l, s) = y, we have 

Dp(0) < pr p ~ 2 (D 2 max((p-l),r/3coth(r/3)) + 5'r) . 

Integrating with respect to y this equation gives the result. □ 



x(d] 

(4-13) 

x(a) 

Define 
(4.14) 
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Remark 5.2. If p > 2 or /i has a smooth density then the function is bounded 
on all compact sets. 

The main result is the following (see |21j for a similar result in a Riemannian 
manifold) . 

Proposition 5.3. Assume -/3 2 < X <k, -5 < ST < 5' ' , "a? < C and 9 < D for 
some /3, k, 5,5' > and C, D > 1. Let p > 1. Assume the support of fi is contained 
in B(xq, R) and is strictly convex on B(xq, C{C + 1)-R)- Assume furthermore 
that the function — H^ tPt p^r),5' is bounded on B(xq,C(C + 1)R) by a constant 
Ch > 0, and that the injectivity radius at any point of B(xq,C(C + l)R) is larger 
than C 2 + C + 1 . Define the gradient algorithm as follows: 

Step 1 Start from a point x± £ B(xq,C(C + 1)R) such that §^ v {x\) < RP (take 
for instance x\ — xq) and let k = 1. 
Step 2 Let 

/ p* y| \ grad(-^, p (x fc ))) F(grad(-^,p(x fc ))) 

(5.4) w fc = — — — — - — — , t k = . 

F(grad(-£^p(x fc ))) C H 

and let jk be the geodesic satisfying 7fc(0) — x k , 7fc(0) = Vk- Define 

(5.5) x k+ i = 7fc(*fe) 
then do again step 2 with k = k + 1 . 

T/ien i/ie sequence (xk)k>\ converges to e p . 

Proof. Wc first prove that the sequence (£' tlj p(xk)) k £N is nonincreasing. For this we 
write 

t 2 
2 



WTk(tk)) < <^,p(7fe(0)) + (d^, p , v k ) t k + C H ^ 



< ^, P (7fc(0)) - F(grad(-4 1 . p (x fe ))) ^-F (grad(-# MlP (a; fc ))) 

Off 



(5 - 6) , /F(grad(-^, p (x fc )))\ 2 



2 V Cfr 

« , rn » C ff ^ F(grad(-^, p (x fc ))) 
= ^, p (7fe(0)) - — I — 

This proves that the sequence is nonincreasing. As a consequence, for all k > 1, 
.T fe e B(x ,C*(C + 1)R), since S^ p {x k ) < RP and for all a; £ B(x , C(C + 1)R), 
S^ p (x) >RP. 

Next we prove that S^^ixk) converges to S' lljP (e p ). We know that ^(kj) 
converges to a > <o^ p {e p ). Extracting a subsequence a;^ converging to some Xoo £ 
B(xo,C(C + l)i?), this implies that tk t converges to 0. But this is possible only 
if Xoo = e p , which implies that a = S^^ie-p). As a consequence, any converging 
subsequence has e p as a limit, and this implies that Xk converges to e p . □ 

Remark 5.4. For this result we need the Hessian of <on tP to be bounded, and the 
subgradient algorithm in Riemannian manifolds as developed in 34. does not work. 
The reason is that for this algorithm, we would need to take 

g rad si4(-^,p( a; fc))) 



Vk 



^(grad^(-^, P (x fc ))) 



14 



M. ARNAUDON AND F. NIELSEN 



where gracLr^ denotes the gradient with respect to the metric <•, ■>^^- So we 
would need to know e p \ 

Corollary 5.5. Let p = 2. If R < = arctan -— - or M has non- 

C{C + l) 2 Vk \C5 2 j 

positive flag curvature, then the algorithm of Proposition 15. 3\ can be applied with 

the appropriate constants 

Proof. With this assumption, by Proposition ^. 2l the function is strictly convex 
on B(x Q ,C(C + 1)R). □ 
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